NVIDIA CMX context memory storage platform

AI Storage Ecosystem for the Data Center

NVIDIA CMX Context Memory Storage Platform

Rearchitecting inference storage for the next frontier of AI.

Overview

AI-Native Storage Accelerates Long-Context Inference at Scale

NVIDIA® CMX™ context memory storage is an AI‑native context tier for long‑context, multi‑turn, and agentic AI inference. Powered by the NVIDIA BlueField®‑4 storage processor, it extends GPU memory with a shared, pod‑level context tier optimized for ephemeral key-value (KV) cache. The platform provides a high‑bandwidth path that reduces latency, cost, and power overhead for large-scale inference workloads, helping deliver higher throughput and better power efficiency on NVIDIA Rubin platforms.

NVIDIA BlueField-4 Powers a New Class of AI-Native Storage for the Next Frontier of AI

NVIDIA CMX extends GPU capacity and enables high‑bandwidth KV‑cache sharing across rack‑scale AI systems. It delivers higher throughput and better power efficiency for long‑context, multi‑turn inference than traditional storage.

Introducing the NVIDIA BlueField-4-Powered Context Memory Storage Platform for the Next Frontier of AI

NVIDIA CMX uses NVIDIA BlueField‑4, DOCA™, and Spectrum‑X™ Ethernet to add a pod‑level context memory tier that delivers higher throughput and power efficiency for long‑context, agentic AI inference.

Products

AI-Native Storage Infrastructure, Integrated From End to End

From accelerated context memory and secure data movement to Ethernet fabrics and inference frameworks, NVIDIA CMX is the result of extreme co-design across compute, networking, storage, and software.

NVIDIA BlueField-4

The NVIDIA BlueField platform accelerates NVIDIA CMX by delivering breakthrough performance, efficiency, and innovation. BlueField-4 manages Non-Volatile Memory Express (NVMe) solid-state drives (SSDs), runs storage services, and offloads data integrity and encryption for KV cache with high power efficiency. Its advanced compute capabilities and software-defined hardware accelerators for networking, storage, and security create a secure, energy-efficient infrastructure for every workload.

NVIDIA DOCA Memos

NVIDIA DOCA Memos is a BlueField-4- and CMX-optimized SDK that manages and shares KV cache across AI compute and CMX data nodes, exposing simple key-value APIs and turning Ethernet-attached flash into a pod-level cache tier. It delivers secure, low-latency access with hardware-accelerated integrity and encryption, so applications stay stateless while CMX handles KV-cache routing and reuse at scale.

NVIDIA Spectrum-X Ethernet Networking

NVIDIA Spectrum-X Ethernet provides the high-performance remote direct-memory access (RDMA) fabric for low-latency, high-bandwidth access to AI-native KV cache across the pod. Purpose-built for AI, Spectrum-X Ethernet uses advanced congestion control, adaptive routing, and lossless RDMA Over Converged Ethernet (RoCE) to minimize jitter and tail latency, delivering consistent, repeatable performance in large, multi-tenant environments. This allows CMX to scale with predictable high performance, maximizing throughput and responsiveness for multi-turn, agentic inference workloads.

NVIDIA Dynamo

NVIDIA Dynamo is a distributed inference-serving framework that makes CMX and the underlying context storage tiers appear seamless across the pod, routing requests to where KV cache already resides. By handling KV-aware placement and reuse in the serving layer, Dynamo increases tokens per second, reduces time to first token, and enables pod-wide context reuse for multi-turn, multi-agent workloads.

Product Benefits

Accelerated Context Memory for Long‑Context AI

NVIDIA CMX introduces a dedicated context tier that improves sustained throughput and power efficiency for KV‑cache-intensive, long‑context workloads compared with traditional storage approaches.

Reclaim Power for Gigascale AI

Scale AI services with a highly efficient, KV-cache-optimized storage tier that reclaims essential power, freeing more of the data center power budget for GPUs instead of traditional storage.

Maximize GPU Utilization, Throughput, and Responsiveness

Optimize data paths and reduce stalls by reusing precomputed KV cache from the CMX tier instead of recomputing it. This boosts tokens per second and throughput for multi-turn, agentic inference. CMX reduces time to first token and time to last token, so answers stream sooner and finish faster, even as models, context windows, and concurrency grow.

Enable Smart, Efficient KV-Cache Sharing Across the AI Pod

Provide high-speed, pod-wide access to AI-native context to enable multi-turn agents to coordinate, share state, and scale seamlessly as workloads grow, while reducing duplication of KV cache and stranded capacity across nodes.

Extend GPU Capacity for Long-Context Inference

Deliver massive KV-cache capacity to support long-context reasoning, multi-agent workflows, trillion-parameter models, and longer-context windows for many simultaneous users.

NVIDIA STX

NVIDIA STX is a modular reference architecture for AI storage, co-designed with leading storage partners and built on NVIDIA accelerated compute, networking, and AI software. NVIDIA STX provides the foundation for building a universal data engine that accelerates the full AI lifecycle, from training and analytics to real-time agentic inference.

Ecosystem

NVIDIA CMX Context Memory Storage Partners

Resources

Building Blocks for the Context Era

NVIDIA BlueField-4 STX Storage Platform Launches With Broad Industry Adoption

NVIDIA STX is a modular AI storage reference design co‑developed with leading providers and built on NVIDIA accelerated compute, networking, and AI software. Learn how it powers the NVIDIA BlueField‑4 STX storage platform that supercharges agentic AI and AI data infrastructure.

Introducing the NVIDIA BlueField-4-Powered Context Memory Storage Platform

A new class of AI-native storage infrastructure uses BlueField to eliminate inference GPU stalls, improve power efficiency, and enable high-speed KV sharing, so inference infrastructure can scale.

NVIDIA CMX Context Memory Storage Platform Solution Overview

NVIDIA CMX provides an optimized, high‑bandwidth path that reduces latency, cost, and power overhead compared with general‑purpose storage approaches, helping deliver up to 5x higher throughput and up to 5x better power efficiency.

Get Started

Collaborate With NVIDIA Experts

Connect with the NVIDIA enterprise sales team or the right partner in the NVIDIA Partner Network (NPN) program to get started.

Need Help Selecting the Right Partner or Product?

Talk to an NVIDIA specialist about your business needs.

Stay Up to Date on NVIDIA News

Sign up to get the latest news, updates, and more from NVIDIA.